-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Display walltime benchmarks with subnanosecond precision #124774
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment has been minimized.
This comment has been minimized.
example results when benchmarking 1-4 serialized ADD instructions ``` running 4 tests test add ... bench: 0.24 ns/iter (+/- 0.00) test add2 ... bench: 0.48 ns/iter (+/- 0.01) test add3 ... bench: 0.72 ns/iter (+/- 0.01) test add4 ... bench: 0.96 ns/iter (+/- 0.01) ```
1d9e681
to
2a7c42f
Compare
Some changes occurred in run-make tests. cc @jieyouxu |
Do you know if there's a reason that it's always calculated in nanoseconds? Being most familiar with |
Switching units when doing a before/after comparison would makes thing more difficult to eyeball. |
@bors r+ |
☀️ Test successful - checks-actions |
Finished benchmarking commit (e93f342): comparison URL. Overall result: no relevant changes - no action needed@rustbot label: -perf-regression Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesThis benchmark run did not return any relevant results for this metric. Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 674.951s -> 674.435s (-0.08%) |
With modern CPUs running at more than one cycle per nanosecond the current precision is insufficient to resolve differences worth several cycles per iteration.
Granted, walltime benchmarks often are noisy but occasionally, especially when no allocations are involved, the difference really is just a few cycles.
example results when benchmarking 1-4 serialized ADD instructions and an empty bench body